My Data Source

My analysis utilizes Georgia absentee voting data files and election results files directly downloaded from the Georgia Secretary of State’s office website. I downloaded all available absentee data spanning from 2014 to 2023. Please note that the website shows an option available in 2013, but it is not available for downloading, so I have excluded data from that year.

Data Story:The Evolution of Absentee Voting in Georgia

Data Prepration – Examine the Data Structure

When examining the files, a typical CSV file for an absentee voting record is named with the date and month. Each month’s folder contains data for all or fewer counties in Georgia. Each county’s CSV file contains numerous columns, including:

  • County name
  • Voter name
  • Voter address
  • Ballot status
  • Ballot style
  • Ballot return date
  • Ballot issued date

For each CSV file, I have executed the following data preprocessing steps using R code:

-Extracted the county name.

-Filtered out empty rows based on the “Ballot Return Date” column, which indicates all valid ballot votes that have been successfully counted. This step provides the total valid vote count for each county in a specific month and year. I counted this column and summed it up for each CSV file, then aggregated these sums for all CSV files in each folder to determine the total valid votes for each year.

-Filtered rows based on the “Ballot Style” column to exclude those indicating “IN PERSON.” This process isolates valid absentee votes, including mail-in or electronic votes. I followed the same procedure of summing up all the valid absentee votes for each CSV, to calculate the total valid absentee votes for each year in Georgia.

Data vetting

Source of Data: This is a primary source, as it comes directly from the Georgia Secretary of State’s office. And it is used from many articles and election reports, and reputable organization such as the Alanta Journal.

Time Period: This website has absentee data files from 2013 - 2013, but 2013 is not downloadable. So time period in this invesigation data story will be using data from 2014 - 2023.

Number of Records: estimated total number of records in the Georgia Absentee Voter Records database, spanning from 2014 to 2023, is approximately 314,720

Duplicates: After sampling several CSV files from different years and counties, I found no duplicate entries in terms of voter names and addresses. This suggests that the dataset may have a high degree of integrity in this aspect. However, this is a preliminary finding based on a limited sample and should be considered an estimate rather than a conclusive result

Consistency Issues: I did a sample check on county names and addresses etc. This preliminary check revealed no significant inconsistencies in the spelling or formatting of these fields.However, as with any large data set, there might be minor discrepancies not captured in the sample.

Numeric Fields: Dates align with the relevant election cycles and absentee voting periods, and vote counts appear reasonable, without any outliers suggesting data entry errors. However, this is based on a limited sample and assumes the data set’s overall integrity. For a comprehensive verification, a full-scale analysis using statistical tools would be needed.

Missing Data: After some examination, it reveals minimal instances of missing data. Key fields such as voter names, addresses, and ballot information are predominantly complete.

Questions for Clarification: I’d raise include, “What are the protocols for data entry and verification by the Georgia Secretary of State’s office?”

“Are there any known limitations or biases in the absentee voting data collection process?”

Key Findings: The initial analysis indicates a rise in absentee voting over recent years, marked by considerable variations among different counties. These trends could be shaped by changes in voting laws, demographic shifts, and societal or technological changes.

Data Reproducibility

If you would like to reproduce the analysis presented in this document, you can follow these steps. Please note that the code below is set not to run automatically when you knit this document to HTML. You need to manually execute it if you want to reproduce the results.

Prerequisites:

  1. Ensure that you have R and RStudio installed on your computer.
  2. Download the data files from [source URL] and place them in a directory on your desktop named “absentee.”

Instructions:

  1. Copy and paste the following code chunk into your R environment.

Path to the main directory on your desktop

this code is for getting Yearly_Absentee_Votes Rate_Summary. csv file.
main_dir <- "~/Desktop/absentee" 

Years to process

years <- 2014:2023

Initialize a data frame for the yearly results

yearly_results <- data.frame(Year = integer(), AbsenteeVotes = numeric(), stringsAsFactors = FALSE)

Process each year

for(year in years) {
    year_dir <- file.path(main_dir, as.character(year))
    # Initialize a variable for summing votes for the current year
    total_votes_year = 0
    # Initialize a variable for summing not in person votes for the current year
    total_votes_year_notinperson = 0
    # Check if the year directory exists to avoid errors
    if (dir.exists(year_dir)) {
        # List all sub-folders within the year folder
        month_folders <- list.dirs(year_dir, full.names = TRUE, recursive = FALSE)
        # Filter only sub-folders that contain CSV files
        month_folders <- month_folders[sapply(month_folders, function(folder) any(file_ext(list.files(folder)) == "csv"))]
        for(month_folder in month_folders) {
            # List of CSV files in the current month folder
            file_list <- list.files(month_folder, pattern = "\\.csv$", full.names = TRUE)
            
            # Process each file in the month folder
            for(file_name in file_list) {
                # Read the CSV file
                data <- read.csv(file_name, header = TRUE)

                # Count the absentee votes (non-empty 'Ballot Return Date')
                total_votes <- sum(!is.na(data$`Ballot.Return.Date`))
                # Count the absentee votes if the 'Ballot.Style' is not 'IN PERSON'
                absentee_votes <- sum(data$`Ballot.Style` != "IN PERSON" & !is.na(data$`Ballot.Return.Date`))
                
                if (is.na(total_votes)) {
                    total_votes <- 0
                }
                if (is.na(absentee_votes)) {
                    absentee_votes <- 0
                }

                # Accumulate the total votes for the year
                total_votes_year <- total_votes_year + total_votes
                total_votes_year_notinperson <- total_votes_year_notinperson + absentee_votes
            }
        }
        
        # Add the year and its total votes to the yearly results
        yearly_results <- rbind(yearly_results, data.frame(Year = year, Total.Votes = total_votes_year, Absentee.Votes = total_votes_year_notinperson))
    }
  }

write.csv(yearly_results, file.path(main_dir, "Yearly_Absentee_Votes_Summary.csv"), row.names = FALSE)


this code is for getting Yearly_Absentee_Votes_Summary By County. csv file**

{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

Path to the main directory on your desktop

main_dir <- "~/Desktop/absentee" 

Years to process

years <- 2014:2023

Initialize a data frame for the yearly results

yearly_results <- data.frame(Year = integer(), County = character(), Total.Votes = numeric(), Absentee.Votes = numeric(), stringsAsFactors = FALSE)

Process each year

for (year in years) {
  #Set the directory to the current year 's folder
  year_dir <- file.path(main_dir, as.character(year))

  # dictionary for county votes
  county_votes <- list()

  # Check if the year directory exists to avoid errors
  if (dir.exists(year_dir)) {
    # List all sub - folders within the year folder
    month_folders <- list.dirs(year_dir, full.names = TRUE, recursive = FALSE)

    # Filter only sub - folders that contain CSV files
    month_folders <- month_folders[sapply(month_folders, function (folder) any(file_ext(list.files(folder)) == "csv"))]

    for (month_folder in month_folders) {
      # List of CSV files in the current month folder
      file_list <- list.files(month_folder, pattern = "\\.csv$", full.names = TRUE)

      # Process each file in the month folder
      for (file_name in file_list) {
        # Read the CSV file
        data <- read.csv(file_name, header = TRUE)
        file_name_sub <- substr(file_name, nchar(file_name) - 15, nchar(file_name))
        # get the unique county name and add to the dictionary
        county_name_list <- unique(data$`County`)
        # remove the ""
        element in the list
        county_name_list <- county_name_list[county_name_list != ""]

        for (name in county_name_list) {
          if (!(name % in % names(county_votes))) {
            # 1 st element is total votes, 2n d element is absentee votes
            county_votes[[name]] <- list(0, 0)
          }
        }

        for (name in county_name_list) {
          # Count the absentee votes(non - empty 'Ballot Return Date')
          total_votes <- sum(!is.na(data$`Ballot.Return.Date`) & data$`County` == name)
          total_votes <- ifelse(is.na(total_votes), 0, total_votes)
          # Count the absentee votes if the 'Ballot.Style'is not 'IN PERSON'
          absentee_votes <- sum(data$`Ballot.Style` != "IN PERSON" & !is.na(data$`Ballot.Return.Date`) & data$`County` == name)
          absentee_votes <- ifelse(is.na(absentee_votes), 0, absentee_votes)

          county_votes[[name]][[1]] <- county_votes[[name]][[1]] + total_votes
          county_votes[[name]][[2]] <- county_votes[[name]][[2]] + absentee_votes
        }
      }
    }

    # Loop through keys in the 'county_votes' dictionary
    for (county_name in names(county_votes)) {
      # Create a new data frame
      for the current county with correct column names
      county_data <- data.frame(
        Year = year,
        County = county_name,
        Total.Votes = county_votes[[county_name]][[1]][1],
        Absentee.Votes = county_votes[[county_name]][[2]][1])
      # Append the county data to the yearly results
      yearly_results <- rbind(yearly_results, county_data)
    }
  }
}

Write the yearly results to a new CSV file

write.csv(yearly_results, file.path(main_dir, "Yearly_Absentee_Votes_Summary By County.csv"), row.names = FALSE)

```